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1.  Introduction 

Xhe  Kalman  filter  (KF)  has  become  an  Important  and  powerful  tool  for  the 
statistician.  Recently,  many  authors  have  exploited  the  state-space  model  and 
KF  recursions  for  estimation  and  prediction  of  time  series.  For  example,  Jones 
(1980)  and  Harvey  and  Plerse  (1984)  use  the  KF  to  obtain  maximum  likelihood 
estimates  of  the  parameters  of  ARMA  processes  when  observations  are  missing.  It 
has  been  suggested  by  Morrison  and  Pike  (1977)  and  others  (cf.  Kendall  (1973))  that 
the  KF  model  provides  an  appropriate  setting  within  which  to  parametrize  smoothing 
and  forecasting  problems. 

To  be  specific,  we  suppose  that  a  pxl  vector  time  series  {y^;  t  *  0,+l,+2,...} 


Is  being  generated  by  the  following  dynamic  system 


Yt  »  +  Vt 


(1.1) 


where  x^  Is  an  unobservable  zero  mean,  p^l  vector  stationary  stochastic  signal,  and 
Vj.  Is  pxl  Gaussian  white  noise,  v^~K(0,R) .  The  dynamics  of  the  stationary  signal 


Is  given  by 


»*t-i  "t 


(1.2) 


where  $  Is  the  pxp  transition  matrix  and  w^  Is  p^l  Gaussian  white  noise,  w^~N(0,Q). 
Furthermore,  {v^}  and  {w^}  are  mutually  Independent  and  we  assume  that  the  system 
and  the  filter  have  reached  steady  state.  We  remark  that  the  superficially  more 
general  model  in  which  (1.1)  Is  replaced  by 

•  MXf  +  v^ 

where  M  Is  a  nonsingular  known  design  matrix  may  be  reduced  to  (1.1)  by  an 
appropriate  change  of  bases. 

Given  the  parameters  of  the  model,  namely,  $,  Q  and  R,  one  may  obtain  the 
minimum  mean  square  error  filter  and  forecasts  for  the  system  via  the  KF  recursions. 
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However^ the  parameters  are  rarely  known  and  hence  must  be  estimated.  Moreover, 
since  the  forecasts  are  based  on  the  estimate  of  the  state  transition  matrix 
the  precision  of  the  estimate  must  be  evaluated.  We  propose  the  bootstrap  as  a 
method  to  evaluate  the  precision  of  the  transition  parameter  estimates,  in  partic¬ 
ular,  to  provide  robustness  against  departure  from  normality  In  the  Gaussian  state 
and  observation  errors,  and  to  assist  In  estimating  forecast  errors. 

In  most  cases,  parameter  estimation  for  the  KF  model  has  been  accomplished 
by  maximum  likelihood  techniques  Involving  the  use  of  scoring  or  Newton-Raphson 
techniques  to  solve  the  nonlinear  equations  which  result  from  differentiating  the 
log-likelihood  function  (cf.  Gupta  and  Mehra  (1974)).  Several  examples  have  been 
given,  notably  by  Ledolter  (1979)  and  Goodrich  and  Caines  (1979),  which  demonstrate 
the  feasibility  of  these  methods  for  several  specific  cases.  Maximum  likelihood 
estimation  of  parameters  In  the  autoregressive  moving  average  (ARMA)  model  express¬ 
ed  In  state-space  form  has  been  considered  by  Harvey  and  Phillips  (1979)  and  Jones 
(1980).  The  methods  In  the  above  references  typically  Involve  using  a  set  of 
recursions  for  the  derivatives  of  the  log-1 Ike llhood  and  require  that  one  Invert 
a  matrix  of  partial  derivatives  at  each  step.  When  the  matrix  of  partlals  (or  Its 
expectation)  Is  well  behaved,  the  Newton-Raphson  and  scoring  procedures  enjoy 
quadratic  convergence  In  the  neighborhood  of  the  maximum  and  one  has  a  ready-made 
estimator  for  the  covariance  matrix  of  the  parameters.  We  discuss  the  Newton- 
Raphson  procedure  for  the  KF  model  In  Section  4. 

Another  maximum  likelihood  technique  uses  the  Q1  algorithm  to  estimate  the 
parameters  of  the  KF  model  (cf.  Shumway  and  Stoffer  (1982)).  Although  this  pro¬ 
cedure  Is  relatively  simple  and  always  Increases  the  likelihood,  the  matrix  of 
partlals  Is  never  computed  so  that  It  Is  not  available  for  providing  estimates  of 
the  standard  errors.  However,  the  bootstrap  may  be  able  to  augment  this  procedure 
by  providing  an  approximation  to  the  distribution  of  the  parameter  estimates. 


V  .V  '■ 

V  - 


’•*  *.•  V  *.**  V  *.*• 
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The  maximum  likelihood  techniques  mentioned  above  require  that  one  supply 

initial  estimates  Cor  starting  values)  which  are  sufficiently  close  to  the  true 

parameters.  As  will  be  seen,  we  shall  require  initial  consistent  estimates  of 

-1/4 

Q  and  R^which  converge  faster  that  n  .  Such  estimates  have  been  given  by 
Anderson  et.  al.  (1969),  Their  estimates,  which  are  computationally  simple  to 
obtain,  are  discussed  in  Section  4.  Further  in  Section  4,  it  is  shown  that  when 
the  aforementioned  Initial  estimates  are  used,  the  one-step  Newton-Raphson  yields 
an  efficient  estimate  of  the  transition  parameter  4  when  the  noise  processes  are 
Gaussian,  We  make  bootstrapping  the  Newton-Raphson  estimate  of  $  appealing  by 
showing,  in  Section  5,  that  the  bootstrap  gives  the  right  answers  with  large  samples. 
That  is,  the  bootstrap  is  at  least  as  sound  as  the  conventional  asymptotics. 

Finally,  in  Section  6,  we  give  emperlcal  evidence  of  the  bootstrap's  im¬ 
portance  in  Kalman  filtering  by  comparing  the  bootstrap  to  the  Newton-Raphson  in 
the  cases  when  the  likelihood  is  Gaussian  and  when  the  likelihood  is  contaminated 
Gaussian. 

Our  goal  is  to  estimate  the  precision  of  the  parameter  estimate  of  $  as  well 
as  the  precision  of  the  forecasts  ’ ’*n+k'  techniques  used  here 

are  based  on  the  bootstrap  (cf.  Efron  (1979))  and  the  methods  used  in  bootstrapping 
least  squeares  estimates  discussed  in  Bickel  and  Freedman  (1981),  Freedman  (1981) 
and  Freedman  and  Peters  (1984).  It  is  noted  in  the  above  references  that  in 
regression  models  (static  or  dynamic),  it  is  appropriate  to  resample  the  centered 
residuals  after  estimating  the  parameters.  This  is  not  possible  in  the  present 
model  (1.1)  and  (1.2)  since  the  signal  is  not  observable.  However,  we  may  base 
the  procedure  on  the  innovations  which  are  obtained  by  taking  the  conditional 
expectation  of  the  signal  given  the  data.  The  bootstrap  procedure  will  involve  the 
resampling  of  the  innovation  sequence 


-*  -- 


(1.3) 


where  by  x^.  we  mean  . coures  x^"-^  will  be  obtained  re¬ 

cursively  via  the  KF, 

Under  the  conditions  stated  in  the  next  section  we  will  be  able  to  put  this 
problem  into  the  nonlinear  regression  context  as  discussed  in  Efron  (1979, 

Section  7).  That  is,  we  may  write 

Ft  =  ^  ^t 

where  are  iid  zero  mean  random  vectors  (namely,  the  innovations)  and  g^{")  is 
a  particularly  complicated,  but  known,  nonlinear  function  of  the  parameters  $,  Q, 


and  R,  the  signal  x^,  and  the  data  yt-i’^t-Z*' 
the  filtered  value  of  the  signal. 


In  particular,  g^(*)  »  x^”  , 


In  the  next  section  we  give  conditions  under  which  we  are  able  to  bootstrap 
the  innovations,  (1.3).  The  bootstrap  procedure  is  given  in  Section  3. 


2.  The  Steady-State  Innovation  Sequence 

Throughout  the  remainder  of  this  paper  we  make  the  following  assumptions  on 
the  pxp  parameter  matrices:  (Al)  Q  and  R  are  positive  definite,  and  (A2)  $  is 
nonsingular  with  spectal  norm,  p(<?),  less  than  unity.  These  conditions  ensure  the 
asymptotic  global  stability  of  the  KF  (cf.  Deyst  and  Price  (1968)). 

The  steady-state  KF  recursions  are  given  by  (cf.  Jazwinski  (1970)) 


K  »  P(P+R)~^, 


(2.1a) 


P  =  $[P-P(P+R)  P]$’  +  Q, 


(2.1b) 


(2.1c) 


t  t-1  ,  y.  t-1. 

-  *t  > 


(2. Id) 


In  the  KF  above,  K  is  the  steady-state  gain  matrix,  P  is  the  steady-state 
prediction  error,  P  -  E{(x^-x^  ^) (x^-x^  ^)'},  and  x^  ^  »  E(x  |y  . ,y  ,,...)  is  the 


steady-state  filter  estimate  of  based  on  the  data  y^  2''*'  ' 

Lemma  2.1  Under  steady-state  and  optimal  filtering,  the  pxl  vector  innovation 
sequence 


(2.2) 


is  a  zero-mean,  white  Gaussian  sequence  with  covariance  matrix  P4-R. 


Proof  Write  ^  *  ®t  ^  '^t  ®t  ”  *t  ”  *t  ^  note  that  E(e^)  *  E(v^)  *  0. 

The  r^  ^  are  Gaussian  since  they  are  linear  combinations  of  Gaussian  random  vectors 
To  establish  the  orthogonality  of  the  innovations,  it  is  easy  to  see  that  while 
r^  ^(=  y^  -  x^  is  in  the  linear  space  spanned  by  •  •  •  )*  "  ®t  ^  ^t^ 

is  orthogonal  to  the  linear  space  spanned  by  {y^  ]^»yt_2’ * " * Hence,  for  s  <  t. 


f  s-1  t-l\ 

E(t,  ) 


Also,  since  e^  and  v^  are  uncorrelated  we  have  chat 


Cov(r^“^)  -  Cov(e^)  +  Cov(v^)  •  P+R. 


D 


As  a  final  remark,  we  note  that  via  (2.1c),  (2. Id),  and  (2.2)  we  may  write  y^ 
in  terms  of  the  steady-state  innovations  as 


V®  aJv  t-j-1  ,  t-1 
>'t  *  ij-i  ♦  “  ■'t-j  * 

■  !?'!■  +  r'"'-  +  t\° 

^1»1  C-J  t  o 


(2.3) 


which  follows  from  the  fact  that  -*•0  exponentially  fast  as  j  ->•  <»  since 

2 


p  (<})  <  1,  where  1  |  $  | 


traceC't' '$ ) .  This  result  will  be  useful  in  establishing 


the  bootstrap  procedure. 
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3.  The  Bootstrap  Estimate  of  Precision 


As  previously  mentioned,  the  bootstrap  technique  will  be  employed  by  resampling 
the  steady-state  Innovation  sequence.  Recall  that  under  optimal  filtering  the 
innovation  sequence  r^  t=l,...,n  is  pxl  Gaussian  white  noise,  r^  ^  ~  Np(0,P+R) 
where  P  is  the  steady-state  error  covariance  matrix  given  in  (2.1b). 

The  bootstrap  procedure  begins  by  estimating  the  parameters  0  =  {$,Q,R}  of  the 
model  (1.1),  (1.2)  by  the  procedures  mentioned  in  the  Introduction.  We  shall  dis¬ 
cuss  the  particulars  in  Section  4.  Call  these  estimates  0  «  {it>,Q,R}. 

From  these  preliminary  estimates  obtain  a  suboptimal  innovation  sequence  by 
filtering  (cf.  2.1)  under  0.  Call  this  innovation  sequence  r^  Make  the  se¬ 
quence  {r^  ^^t-1  identically  distributed  with  distribution  equal 

to  the  emperical  distribution  by  putting  mass  n  ^  on  each  innovation  r^  t=l,...,i 
Next,  draw  a  "bootstrap  sample"  of  innovations,  t=l,...,n  by  independent 

random  sampling  of  the  residuals  r^  That  is,  sample  the  n  times,  with 

replacement  from  {r^,r2. • • • From  this  we  obtain  a  "bootstrap  sample"  of 
data  y2.*‘‘'’^n  setting  (cf.  2.3) 


^t 


~*t-l 

r^ 


j-1  ’'t-j  ’ 


t— 1, . . .  ,n 


(3.1) 


where  K  is  the  estimated  gain  matrix  obtained  via  filtering  under  parameters  0. 

We  make  the  following  suggestions  before  proceeding  with  step  (3.1).  First, 
as  suggested  in  Freedman  (1981),  one  should  center  the  residuals  r^  ^  before 
resampling  them  so  that  the  emperical  distriubtion  puts  mass  n~^  on  r^  ^  where 

=  n  ^  ^t~l  ^t  Second,  we  suggest  checking  whether  the  innovations  are  nearly 
white.  It  is  known  that  a  suboptimal  filter  produces  correlated  innovations  (see, 
for  example,  Mehra  (1970))  and  hence  this  is  a  check  on  the  "goodness"  of  the 
estimates.  Various  methods  are  available  for  testing  the  whiteness  of  the  innova¬ 
tions  many  of  which  are  listed  in  Mehra  (1970). 


Now,  suppose  that  the  bootstrap  data  .  ,7^^/  come  from  the  model 


^t  “  ^  Vt. 


t  >  1. 


•k  * 

Xt  “  'f  ”t’  ^ 


(3.2a) 


(3.2b) 


where  is  pxl  Gaussian  white  noise  )  and  is  independent  of  w^  which  is 

it  it  it  it  it  it 

pxl  Gaussian  white  noise  w^^N(0,Q  ).  Assume  the  parameters  0  *  {<1'  ♦Q  ,R  }  are 

unknown  and  to  be  estimated. 

k 

The  parameters  9  are  then  estimated  by  the  Initial  optimal  procedure  to  pro- 

•.4r  ^it  ^ic  •.it 

duce  estimates  6^  >  {4^,Q^,R^}.  Then,  the  suboptlmal  Innovation  sequence  is 

resampled  and  the  bootstrap  procedure  is  reiterated. 

The  entire  process  Is  repeated  some  large  number  "L"  of  times  obtaining  L 

~k 

bootstrap  replications  0j^,02, . . .  ,0j^.  The  distribution  of  the  errors 


-* 

<t  -  $ 


(3.3) 


are  then  computed  to  give  an  approximation  as  to  the  distribution  of 


(3.4) 


The  bootstrap  distribution  of  the  errors  (3.3)  may  then  be  used  to  obtain  confidence 
regions  and  tests  of  hypotheses  about  the  parameters  $.  Justification  of  this 


procedure  is  given  in  Section  3. 

Forecasting  k  steps  into  the  future,  say  x 


^^^n+j^^n’Vl”*’^’  j*1.2,...,k 


is  easily  accomplished  via  the  filter  equations  (2.1),  namely 


j=l,...,k. 


(3.5) 


The  suboptimal  forecasts  will  be  obtained  via  the  KF  under  parameter  estimates  0 


so  that 


will  be  the  actual  forecasts.  If  at  each  bootstrap  replication  we  obtain 


,n*  ~n*  ,Cl)  r-n*  -n*  ,(L) 

^*n+l ’ * • • ’*n+k^  ’ • • *  *  ^*n+l ’ *  * ' ’*n+k^ 


n+k^ 


n+1' 


(3.7) 


we  may  extract  the  emperlcal  distribution  of  the  forecast  residuals 

j=l, . . . ,k 


•>n*  -n 

X  . .  ~  X  , 
n+ j  n+3 


(3.8) 


which  can  then  be  used  to  approximate  the  distribution  of  the  actual  forecast 
errors 


"'n+J  "  ""n+j’  (3.9) 


From  the  distributions  of  (3.8)  we  may  obtain  prediction  regions  for  the  forecasts 
(3.5). 


4.  Parameter  Estimation 

In  this  section  we  give  the  details  of  the  consistent  and  efficient  estimation 
of  the  parameters  of  the  KF  model  (1.1),  (1,2).  Recall  that  the  system  is  in 
steady-state  and  the  parameters  0  ”  {'f,Q,R}  satisfy  the  conditions  (Al)  and  (A2) 
given  in  Section  2.  First,  we  discuss  the  initial  consistent  estimates  given  in 
Anderson  et.  al.  (1969)  and  give  related  results.  Second,  we  discuss  the  Newton- 
Raphson  procedure  for  the  KF  model  and  in  particular  we  show  that  the  procedure  is 
sound  for  the  given  model.  We  note  that  the  assumption  of  normality  of  the  error 
processes  is  not  needed  to  establish  the  results  of  this  section. 


4,1  Initial  Consistent  Estimates 

The  following  estimates  are  given  in  Anderson  et.  al.  (1969). 


Let 


^n  “  ^^t=3Vt-2^^^t»3yt-l^t-2^^’  ^ 


(4.1) 


where  by  +  we  mean  generalized  inverse.  Further,  define 


S„(1)  -  n- 


n  >  3,  i-1,2 


and  set 


and 


n  ^  n  n  n  n  n 


Q  =  B  (1)  -  R  -  I  R  $' 
n  n  n  n  n 


(4.2) 


(4.3) 


provided  that  is  invertible. 

^  A  A 

Anderson  et.  al.  (1969)  show  the  strong  consistency  (n  -*•  »)  of  Q^,  and  R^ 
for  $,  Q  and  R,  respectively,  under  the  model  assumptions  (1.1),  (1.2).  To 
establish  the  bootstrap  principle  in  Section  5,  we  need  the  following  results  which 
exhibit  the  behavior  of  the  suboptimal  filter  and  forecasts  (see  Anderson  et.  al. 
(1969),  Theorems  2.4,  2.5  and  Corollary  2.4).  Denote  positive  (semi)-def inite  by 
p. (s.)d. 


Result  4.1.  If  Q  is  p.d.  and  if  strongly  consistent  estimates  of 

$,Q,R,  respectively,  for  which  is  p.d.  and  R^  is  p.s.d.  for  all  n  ^  1,  then 

P  -*■  P  and  K  -►Ka.s.  aan-^<»>  where  P  and  K  are  the  estimates  of  the  steady- 
n  n  n  n 

state  filter  covariance  and  gain  matrices,  respectively. 


Result  4.2.  Let  the  hypotheses  of  Result  4.1  be  satisfied,  and  suppose  p('t)  <  1. 
If  in  addition,  Eilv^l^}  <  *  and  E{|w^l*^}  <  <»  for  some  k  ^  1,  then 


11m  _  n 

n-HJo 


-1 


r 

If. 


1 1  -ti 

■ilv^l 


a.  s , 


and 


lim  n 


-1 


y''  -1x1  -  0 

^t»l'  t+d  n  t' 


a.  s. 


for  any  integer  i 

Anderson  et.  al.  (1969)  do  not  establish  the  asymptotic  normality  of  the  esti¬ 
mates  given  in  (4.1),  (4.2)  and  (4.3).  This,  however,  is  easily  accomplished  via 


the  following  theorem  which  may  be  found  in  parts  in  Hannan  (1970,  Chapter  4). 


First,  we  need  some  definitions.  If  *  J?  A.e^  . .  1 Ia^I I  <  “,  and  the 

t  ^J=— 00  j  t-J  j 

are  independent  and  identically  distributed  with  mean  vector  zero  and  finite 


covariance  matrix  G,  then  we  say  that  u^  is  generated  by  a  linear  process.  Define 
the  sample  autocovariance  function  of  u^  from  a  sample  of  length  n  to  be 


(4.4) 


and  the  autocavariance  function  of  u  to  be  r(h)  =  E{u  u'  . 

t  t  t-h 

Theorem  4.1  Let  u^  be  generated  by  a  linear  process  and  suppose  the  fourth 
cvnnulant  of  is  finite.  Let  c^^(h)  and  Yj^j(h)  denote  the  ij^h  element  of  C^(b) 
and  r(h),  respectively.  Then  (a)  r(h)  a.s.  as  n  -►  «>  for  any  h,  and 

(b)  for  any  integer  H  >  0  and  integers  £(h),  the  joint  law  of 

i/n  {c_(J!,(h))  -  Yj^j(i^(h))}  i,j=l,...,pj  h=l,...,H 

converges  (n  -►  ®)  to  that  of  a  zero-mean  normal  with  asymptotic  covariances 
n  Cov(c^j(m)  ,Cj^^(h)) 

!”=-«■  Y^jj(r+h)Y^j(r-m)  +  k:  _j^^(0,m,r,r+h)  }  (4.5) 

where  fourth  cumulant  function  of  u^;  m,  he  { £(1) ,  . .  . ,  i(H)  }  .  The 

fact  that  is  absolutely  summable  follows  from  the  finiteness  of  the  fourth 

cumulants  of  e^  (cf.  Hannan  (1970)  p.  211  for  details).  Note  that  if  u^  is 
Gaussian  the  fourth  cumulants  vanish. 

It  is  clear  from  Theorem  4.1  that  since  y^.  given  by  (1.1)  is  generated  by  a 
linear  process  (cf.  2.3),  the  estimates  given  in  (4.1),  (4.2),  and  (4.3)  are  in 

^00  ■?  T  ^  h 

fact  strongly  consistent:  Simply  note  that  r(h)  =  4>  where  5  is 

the  Kronecker  5.  Moreover,  since  the  estimates  are  linear  combinations  of  the 
asymptotically  jointly  normal  variates  C^(h),  it  is  clear  that  (<t^  -  $ ) ,  i/n  (^ 


and  (R^  -  R)  are  asymptotically  normal  with  covariance  matrices  determined  via 

(4.3).  This  establishes  the  desired  rate  of  convergence  needed  for  the  Newton- 

Raphson.  That  is,  -  <f),  (Q^  -  Q)  and  (R^  -  R)  are  all  Op(n  .  We  note 

that  we  shall  use  the  same  order  notation,  o  and  0  ,  for  matrix  as  well  as  vector 

P  P 

variates,  no  confusion  should  arise  from  this. 

4.2  The  Newton-Raphson  Procedure 

In  this  subsection  we  demonstrate  the  soundness  of  the  Newton-Raphson  pro¬ 
cedure  for  the  given  KF  model.  The  techniques  used  in  this  section  will  also  help 
us  establish  the  bootstrap  principle  in  the  next  section. 

So  that  we  may  explicitly  exhibit  a  Newton-Raphson  iteration  we  reparametrize 
the  problem.  Let  P  be  as  defined  in  (2.1b)  and  let  W  =»  (P+R)  ^  be  the  inverse  of 
the  covariance  matrix  of  an  innovation.  We  then  consider  the  problem  of  estimating 
($,P,W)  via  Newton-Raphson.  Note  that  our  original  parameters  (<t,Q,R)  are  easily 
identified  from  ($,P,W),  namely  Q  “  P  -$[P-PWP]$*  (cf.  2.1b)  and  R  =  W”^  -  P.  In 
this  manner  we  may  write  (2.3)  as 


(4.6) 

=  l]z\  +  rj.  +  $‘'x° 

where  we  have  dropped  the  superscript  t-1  from  the  r^'s. 

Let  9°  be  the  kxl  vector  containing  the  distinct  parameters  of  ('I>,P,W)  and  note 

2 

that  k  =  p(2p+l)  since  <f  contains  p  ,  P  contains  p(p+l)/2,  and  W  contains  p(p+l)/2 

distinct  parameters.  The  Newton-Raphson  procedure  considers  minimizing 

1  I 

Q(e)  =  n'^  {W^  -  Gj.(0°-6)]}’{W^  [r^  -  G^(9°-e)]}  (4.7) 

where  9  is  an  initial  estimate  of  9°,  ^  where  x^  ^  is  obtained  by 

running  a  KF  under  parameter  estimates  9,  and  6  is  the  pxk  matrix  of  partials 


3r^/39  evaluated  at  9  “  0.  may  be  explicitly  obtained  via  equation  (4.6)  by 
considering  the  model  in  canonical  form.  Specifically,  let  E  be  the  nonsingular 
matrix  for  which  E  ^4>E  is  block  diagonal.  Then,  let  “  e”^x^,  e^  =  E”^w^, 

“  E  *  E  ^v^  in  which  case  we  transform  the  model  (1.1),  (1.2)  to 


z^  =  St  + 


■  ''n-i  't 

where  A  *  E  ^tE  is  block  diagonal.  Then  we  may  consider  writing  in  the  form  of 
(4.6)  in  which  case  has  a  nice  form  (see  Fuller  (1976)  p.  49). 

In  view  of  (4.7),  the  one-step  Newton-Raphson  estimate  of  9°  is  given  by 


9  =  9  +  [n'^  G’WGj'-^In‘-^  G^W^^] . 


-lr_-l  vn 


(4.8) 


In  the  examples  of  Section  6,  we  shall  consider  the  univariate  case,  p=l. 

Thus,  it  is  worthwhile  to  give  the  explicit  Newton-Raphson  procedure  for  that  case 
here; 

1.  Estimate  $,  Q  and  R  via  (4.1),  (4.2)  and  (4.3),  respectively.  Call  the  estimates 
$,  Q  and  R. 

2.  Run  a  KF  under  $,  Q,  R,  and  x°  =  0  to  obtain  P,  K,  and  Xp...,x”  Obtain  the 
innovations  r^  =  y^  -  x^  ^  ;  t»l,...,n. 


3.  Calculate  the  partials  via  (4.6) 
r  -  r_  Vt-1  ,lj-loA' 


- 1]:[  -  ijii 

for  t=2,...,n,  Gj^  *  [0,0,0];  where  W  =  (P+R) 


4.  Update  via  (4.0): 


*$ '  "  i' 

f  *  P  +  tIJ-2  ^t^t’'^^^t-2  ^t^^  • 
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We  now  establish  the  asymptotic  properties  of  the  Newton-Raphson  estimate  6 
given  in  (4,8)., 


Theorem  4,2,  Let  Q,  and  R  satisfy  assumptions  (Al)  and  (A2)  given  in  Section  2 

^  A  A 

Let  Q^,  and  R^  be  the  initial  consistent  estimates  given  in  (4.1),  (4,2),  and 
C4,3),  respectively,  and  let  6,  as  defined  above,  consist  of  these  estimates. 
Further,  let  be  an  estimator  of  which  is  bounded  in  probability,  and  assume 
that  r^  has  finite  fourth  cumulant.  Then 

^  C0  -  6°)  i  Nj^(0,B“^(6°)) 

where  9  is  defined  in  (4.8).,  k  ■  p(2p+l),  and  B(0°)  «  plim  ^  ^  l!!  i  where 

n  c*  X  u  c 

and  W  are  defined  in  (.4,7). 


Proof.  The  proof  parallels  the  proof  of  Fuller  (1976,  Theorem  8.3.1).  See  also 
Fuller  (.1976,  Theorem  5.5,1  and  Corollary  5.5.1).  One  must  simply  note  that  the 
elements  of  the  matrices  of  first,  second  and  third  order  partlals  of  r^  with  res¬ 
pect  to  9  converge  to  linear  processes  as  t  ->■  »  and  Theorem  4.1a  applies.  One 
may  then  show  that 


(9  -  0°)  -  [n"-^  G°'wG°]'-^  (n"-^ 


where  G°  is  the  pxk  matrix  3r^/39  evaluated  at  9  »  9°,  and  that 


^  It-l  ^  N^(0,B(e°)). 


The  result  of  the  theorem  then  follows. 


□ 
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5.  The  Bootstrap  Principle 

In  chls  section  we  justify  the  techniques  established  In  Section  3.  Throughout 
this  section  we  replace  the  normality  assumptions  with  the  assumption  that  the 
noise  processes  w^  and  have  finite  fourth  moments  so  that  the  observations  y^ 
satisfy  the  conditions  of  Theorem  4.1.  As  In  the  previous  section  we  drop  the 
superscript  t-1  from  the  Innovations  r^ 

Before  proving  the  bootstrap  principle  given  In  Section  3,  we  state  the  follow¬ 
ing  useful  lemmas.  First  some  notation  Is  needed.  If  Is  a  p-dlmenslonal  space 
equipped  with  the  Euclidean  norm  1 • ]  and  a  ^  1,  then  d^(y,v)  Is  the  distance  be¬ 
tween  probability  measures  u  and  v  In  R*’  defined  as  the  Inflmum  of  E{  ju-!/ 
over  all  pairs  of  random  vectors  U  with  law  p  and  V  with  law  v  (cf.  Blckel  and 
Freedman  (1981) ) . 


Lemma  5.1  Let  0  =  (<I',Q,R)  and  0^  ■  ^*n’^n’^n^  satisfy  the  conditions  of  Result 
4.2.  Let  F^  be  the  emperlcal  distribution  function  (e.d.f.)  of  the  suboptlmal 
Innovations  r^,  t*l,...,n  generated  by  0^  and  let  F^  be  the  e.d.f.  of  the  optimal 
Innovations  r^  t»l,...,n  generated  by  0.  Then  0  almost  surely  (a.s.) 

as  n 


Proof  Noting  that  r^  ■  ^t”*t  ^  ^t  *  ^t~*t  view  of  Result  4.2,  we  have 


-1  rn  I  c 
=  "  At-i 


t-1  -t-1 1 4 

-  ^t  1 


a.s. 


as  n  ®. 


□ 


Lemma  5.2  Let  F^  be  the  e.d.f.  of  the  optimal  Innovations,  r^,  t*l,...,n  and 

let  F  be  the  common  distribution  of  r^.  Then  d?(F  ,F)  -►0  a.s.  as  n-^®. 

t  H  n 


•  *  •  * 


.V 
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Proof  Since  Che  optimal  steady-state  innovations  are  iid  (cf.  Proposition  2.1) 
with  finite  fourth  moments,  this  follows  from  Lemma  8.4  of  Bickel  and  Freedman 
(1981).  □ 

Now,  let  be  sample  autocovariance  function  of  the  observations 

(cf.  4.4).  Let  t.(F)  be  the  law  of  C  (h)  when  the  law  of  r^  is  F.  Metrize  the 

n,n  n  t 

ill's  by  d^  ^  and  the  F's  by  d^.  Then  we  have  the  following  lemma  which  is  similar 
to  Freedman  (1984,  Lemma  6.3),  however,  for  the  sake  of  completeness,  we  provide 
a  proof. 


Lemma  5.3  The  <1/.  ..(F)  are  equiuniformly  continuous  functions  of  F  on 

II  y  Ift 

S  -  {F:  I  |r|^  dF(r)  ^  a^  <  »}. 

rP 

* 

Proof.  Fix  F  and  F*  in  S.  Construct  iid  random  vectors  r^.,  r^;  t“l,...,n,  so 

* 

that  r..  has  law  F  and  r^  has  law  F*,  and 
t  t 

dP  (F,F*)^  •  E{|rj-r*l^}. 


See  Bickel  and  Freedman  (1981,  Lemma  8.1) .  Build  y^  from  the  r^  and  y^  from  the 
* 

r^  as  in  (2.3).  Then,  for  h  ^  0 


r  I  '  *  *'  I  , 

+  E(  ly^-y*|  •  ly'.^l  1. 


(5.1) 


We  concentrate  on  Che  first  term  in  (5.1),  the  second  being  treated  similarly. 

Now,  by  the  Cauchy-Schwartz  inequality  and  the  fact  that  y^  is  a  linear  process 

Eily^l • |yg-y*|}^  1  E{|y^|^}  E{|yg-y*|^} 

1  E{  |y  -yV}. 


Using  the  fact  that  if  are  independent  random  vectors,  then 

E{|EjUjl^}  <  EitUjI^}  +  llj 


we  have  that  In  view  of  equation  (2.3) 


|y,-y*l^)  -  EdIJ.i  W's-j-'s-j* 


where 


<  E{l5  1^}  +  £{[6  1^}  +  1e{6  >1^  [(^)  +  1]^  (5.2) 

1-p^ 


6  »  r^-r, ,  p  «  [♦11  <  1,  and  k 


It  Is  clear  that  (5.2)  Is  small  If  F  and  F*  are  close  in  d2.  □ 

Now,  let  starred  variables  denote  those  obtained  via  the  bootstrap  sample 

,  *  *, 

•  In  this  manner  we  denote 


IJ-iH-i  y^rih-  "  i  ° 


as  the  bootstrap  counterpart  of  C^(h) •  Furthermore,  let  E,  E  denote  expectation 


under  F,  F^,  respectively.  Let  Z^j(t,h)  -  ytiyt-h,j  "  ®^^tl^t-h 

Z*j(t,h)  -  y*iy*_h,j  "  ^*^yti^t-h,j^’  i,j»l,...,P;  h-0,1 .  Then  Zj^j(h)  - 

n  ^  Z^j(t,h)  will  have  the  same  ergodlc  properties  as  {Cj^j(h)  -  Yj^j(h)} 

(Hannan  (1970)  calls  the  matrix  whose  elements  are  Z, ,(h),  C  (h)  -  r(h),  and  shows 

ij  n 

that  C^(h)  and  C^(h)  as  we've  defined  it  have  the  same  limiting  properties).  For 
details  see  Hannan  (1970,  p.  208  and  p.  228). 

We  now  state  the  following  theorem. 


Theorem  5.1.  Let  satisfy  the  conditions  of  Theorem  4.1.  Then,  along  almost 
all  sample  sequences,  as  n  -*■  <»,  conditionally  on  the  data,  for  any  Integer  H  >  0 
and  Integers  £.(h). 
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(1)  C^(Jl(h))  -►  r(£(h))  in  conditional  probability,  and 

(2)  the  joint  conditional  law  of  v'n  Zj^j(J,(h))  merges  with  the  joint  unconditional 
law  of  Zj^j(il(h)),  i,j«l,...,p;  h-l,2,...,H. 

Proof.  The  proof  of  part  (1)  follows  from  Lemmas  5.1,  5.2,  and  5.3.  That  is,  the 
conditional  law  of  C  (h)  given  the  data  differs  little  in  the  sense  of  df’^^  from 
the  unconditional  law  of  t>y  Lemma  5.3,  because  the  e.d.f.  of  the  suboptimal 

Innovations,  F^,  differs  little  in  the  sense  of  d^  from  the  law  of  the  optimal 
innovations,  F,  by  the  combination  of  Lemmas  5.1  and  5.2. 

We  prove  part  (2)  by  showing  that  as  n  the  joint  conditional  law  of 

_ 

/n  Z  (Ji(h))  is  the  joint  law  described  in  Theorem  4.1. 

*  .  *  * 

Let  r^  be  lid  F^  (appropriately  centered)  and  let  y^  be  generated  by  (cf.  2.3) 

“it  ^  ^  ^  ^ 

■  Ij.i  'c-j  * 't 

where  $  and  K_  are  the  consistent  estimates  of  and  K  described  in  Section  4  such 
n  n 

that  p(^jj)  <  1*  For  convenience,  define  p^p  matrices  A(j)  ■  ***** 

A(0)  -  I.  Then  for  all  h, 

*  roo  «  *  ,  poo  ,  * 

E  {yt^t-h^  *  =  ^^j-o 


Ij.O  A(j+h)  E  )  A’(j) 


IJ-0  Cl  ^m^m^ 


(5.3) 


Since  by  Lemmas  5.1  and  5.2,  d2(F^,F)  -►  0  a.s.  as  n  -►  ®,  it  follows  that  given  the 
*  * 

data,  E  )  -►  Ck)v(r^)  •  P+R  a.s.  as  n  -*•  ®  (cf,  Blckel  and  Freedman  (1981), 

Lemma  8.3).  Hence,  we  conclude  from  (5.3)  that  conditional  on  the  data,  as  n  -*■  ®, 
E*(y*y*^h^  ^  r(h),  all  h. 

Note  that  for  a,b,c,d,  i,  j  ,k,Jl-l, . . .  ,p,  and  t,u,v,WM0,+l,+2, . . . , 


ES 


A  •  **•  '*•  “*•  •  k •  ""a  ' *•  “  •  **•  **•  **•  A**  **»  *•  *r«  *r»^***  '*•  **•  ** 


'  V.' •vV  ’ 
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-1  rll 


r  .r  .r  ,  r  . 
ml  mj  mk  m£ 


t  ■  u  ■  V  ■  w 


*  *  *  *  * 

E  (r^ .r  .r  , r  .) 
ti  uj  vk  w£ 


(n”^  .  T  r  u)  («”■*■  I"  1  r  r  j) 

ma  mb  ^m*l  me  md 


“1  rll 


t,u,v,w  equal  in  pairs 
but  not  all  equal 


otherwise 


where  a,b  and  c,d  correspond  to  the  pairs  of  subscripts  which  are  equal  (e.g..  If 
t  »  u  and  V  *  w,  then  a  ■  1,  b  »  j ,  c  »  k,  d  =  1) .  It  is  clear  that  the  fourth 

•k 

cumulants  of  the  r^  are  finite  and  hence,  so  Is  the  fourth  cumulant  function  of  the 

*  *  *  * 
y^.  Thus,  the  satisfy  the  conditions  of  Theorem  4.1.  That  is,  if  yj^,...,y^  is 

a  (bootstrap)  sample  of  size  m,  the  joint  law  of  Z^^ClCh)),  i,j=l,...,p; 
h»l, . . .  ,H,  H  >  0  Integer,  converges  (m  -►  <»)  to  a  zero-mean  normal  law  with  asympto¬ 
tic  covariances  evaluated  as  In  (4.5).  Hence,  part  (2)  follows  if  the  conditional 

lAr  4c  4*  4c  4c 

moments  E  (t-s)  and  fourth  cumulants  K^jj^^(0,t,u,v)  E  E  (ygj^yg^.^  j 

a.s.  to  the  unconditional  values  Y^jCt-s)  and  t,u,v) ,  respectively.  We 

have  already  seen  that  Yj^j(t-s)  ■*  Yj^j(t-s)  a.s.  as  n  -►  *.  Also,  by  Lemmas  5.1  and 

5.2,  d?(F  ,F)  ->  0  a.s.  as  n  -►  <»  from  which  it  follows  that  the  fourth  conditional 
4  n 

* 

moments  of  y^  converges  (n  -►  «)  a.s.  to  the  fourth  moments  of  y^  (cf.  Bickel  and 
Freedman  (1981),  Lemma  8.3)  which  completes  the  proof.  □ 

A  A  A  it 

Let  Q^,  and  denote  the  bootstrap  initial  estimates  of  $,  Q  and  R,  res¬ 
pectively,  obtained  by  evaluating  (4.1),  (4.2),  and  (4,3),  respectively,  with  y^ 
replaced  by  y*.  In  view  of  Theorem  5.1,  we  have  that  1^,  0^,  and  R*  are  consistent 

for  <t,  Q,  and  R,  respectively,  in  conditional  probability  with  the  desired  con- 
*,  -1/4  *  -1/4 

vergeiMte  rate  of  o  (n  ) .  By  o  (n  )  we  mean  a  variate  which  is  of  smaller 
P  P 

-1/4 

order  in  conditional  probability  than  n  .  These  facts,  of  course,  parallel  those 


given  in  Section  4.1  for  the  original  data  y^. 


L9 


Next,  we  establish  the  appropriate  asymptotics  for  the  Newton-Raphson  procedure 

A 

involving  the  bootstrap  data  y^;  paralleling  the  results  of  Section  4.2  for  the 

A 

original  data  y^.  Recall  that  in  the  realm  of  the  bootstrap  the  data  y^  are 
generated  via  (cf.  3.1) 

^t  ’  ^j-1  *  ^  ^  ^5.4) 

where  $  and  K  *  IW  are  the  Newton-Raphson  estimates  obtained  via  (4.8);  r^  *  ^t”*t”^ 

~t-l  -  -* 

where  is  obtained  via  the  KF  under  parameters  6,  and  r^  is  obtained,  as  des¬ 

cribed  in  Section  3,  by  resampling  the  r^. 

Let  §  be  the  kxl  vector  containing  the  appropriate  elements  of  P^,  and  W^. 
The  Newton-Raphson  estimate  of  6  based  on  a  bootstrap  sample  of  size  n  is  thus 


-A 

9 


«A  -1  -n  »A»»A«A  -1 

9  *1" 


(5.5) 


-A  A  «t-l*  -t-1* 

where  ”  *t  ’  *t  obtained  via  filtering  under  parameter  estimates 

§  ,  and  6^  is  the  pxk  matrix  of  partials  3r*/39  evaluated  at  9  *  6  .  We  now  state 
the  bootstrap  principle  in  the  following  theorem. 


^  TV  » 

Theorem  5.2  Let  9  be  the  k^l  bootstrap  estimate  given  in  (5,5)  and  let  9  be  given 
by  (4.8).  Then,  along  almost  all  sample  sequences,  as  n  conditional  on  the 

data,  the  law  of  (5  -  9)  merges  with  the  law  of  (9  -  9°)  as  given  by 

Theorem  4.2. 


Proof.  The  proof  of  this  theorem  will  parallel  that  of  Theorem  4.2.  That  is,  in 
view  of  (5.4)  the  elements  of  the  matrices  of  the  first,  second,  and  third  order 
partials  of  r*  with  respect  to  S  are  linear  processes  and  we  may  show  that  con¬ 
ditional  on  the  data 


1"'“  Imi  KK'  * 


(5.6) 
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since  Q^,  and  have  the  desired  convergence  rate  previously  established.  In 


(5.6),  is  the  pxk  matrix  3r^/30  evaluated  at  9  =  0.  Moreover,  since  (9  -  0°) 

( 

P 


is  o^(n  and  n  ^  1^=1  have  that  conditional  on  the  data 


(9*  -  9)  =  [n‘-^  G°’WG°]‘^  (n'-^  ^^t^ 


t  P 

-1  rn  ^o'„^Oi-l  ,  -1  rn 


Next,  following  Lemmas  5.1  and  5.2  we  may  show  that  d?(F  ,F)  -►  0  a.s.  as  n  -►  “> 

l.  n 


where  F  is  the  e.d.f.  of  the  r^  and  F  is  the  common  distribution  of  r  .  Hence  we 
n  t  t 


may  easily  establish  (by  paralleling  Theorem  4.2)  that  conditional  on  the  data, 
as  n  ->■  “> 

-1  rn  „o’„-*,  L 


^  t""  It=l  ^t  “^t^  ^  Nj^(0,B(9°)) 


from  which,  in  view  of  (5.7),  the  theorem  follows. 


□ 


6.  Examples 

In  this  section  we  submit  two  examples.  The  first  example  considers  the 
bootstrap  for  the  univariate  KF  model  when  the  noise  processes  are  Gaussian.  In 
the  second  example,  we  consider  the  case  when  the  noise  processes  are  contaminated 
normals.  For  the  sake  of  clarity,  we  first  provide  the  step-by-step  bootstrap 
procedure  for  the  KF: 

Given  the  data 

1.  Obtain  the  initial  consistent  estimates  of  Q,  and  R  via  equations  (4.1), 
(4.2)  and  (4.3),  respectively. 

2.  Filter  (cf.  2.1)  under  Q^,  and  R^  to  obtain  P^,  K^,  and  *  y^  ” 

t*l , . . . , n . 

3.  Obtain  the  Newton-Raphson  estimates  9  via  (4.8).  Also,  see  the  discussion 
below  (4.8). 

“9  i»  t*"l 

4.  Filter  under  9  and  obtain  the  innovations  "  ^t  ”  ^t  ’  t*l,...,n.  Center 
the  r^. 


5.  Sample  with  replacement,  n  times  from  {r, ,...,r  }  to  obtain  {r, ,...,r  }. 

In  in 

It  A 

6.  Obtain  the  bootstrap  data  (3.1). 

•.  A 

7.  Repeat  steps  1,  2  and  3  using  the  bootstrap  data  yielding  6^,  the  first  boot¬ 
strap  estimate  of  6. 

it  it 

8.  Repeat  steps  5,  6,  and  7  a  large  number  "L"  of  times  to  obtain 

Example  6.1  In  this  example  we  generated  n«250  Gaussian  observations  from  the 
KF  model  (1.1),  (1.2)  with  parameters  i  =  0.8,  Q  =  4.0,  and  R  *  1.0.  The  one- 
step  Newton-Raphson  estimates  were  then  bootstrapped  L  =  250  times  and  we  com¬ 
pared  the  Newton-Raphson  estimate  of  the  standard  error  of  based  on  the 
asymptotics  of  Theorem  4.2,  to  the  bootstrap  estimate  of  the  standard  error  of  4. 
The  summary  results  of  30  such  runs  are  given  In  Table  6.1.  Also  included  In 
Table  6.1  Is  the  emperlcal  standard  error  of  the  Newton-Raphson  estimate  of  $ 
obtained  from  2000  generated  samples  of  length  n  «  250  observations  from  the  model. 


TABLE  6.1 

Standard  Error  of  <I> 

Mean 

Standard  Deviation 

Bias 

Bootstrap^ 

-3^ 

3.933  X  10  ^ 

-3^ 

0.651  X  10 

-0.166  X  10" 

Newton-Rapshon 

-I*’ 

1.380  X  10  ^ 

-3^ 

0.479  X  10 

Emperlcal 

_3^ 

3.605  X  10  ^ 

-0.026  X  lO" 

Table  6.1:  Summary  of  the  estimates  of  the  standard  error  of  the  Newton-Raphson 
estimate  of  <I>  In  the  KF  model  with  Gaussian  noise  for  samples  of  length  n  »  250 

a:  Based  on  L  «  250  replications  d:  Average  bias  relative  to  the 

corresponding  Newton-Raphson 

b:  Based  on  30  runs  estimate 


c:  Based  on  2000  runs 


e:  Bias  relative  to  the  true  value 


Example  6.2  In  this  example  we  generated  n  ■  250  contaminated  normal  observations 
from  the  KF  model  (1.1),  (1.2)  with  parameters  $  »  0.8,  Q  =  4.0  (90%)  +  16.0  (10%)  * 
5.2  and  R  *  1.0  (90%)  +  9.0  (10%)  ■  1.8.  That  is,  the  state  noise  is  N(0,4)  with 
probability  90%  and  N(0,16)  with  probability  10%,  while  the  observation  noise  is 
N(0,1)  with  probability  90%  and  N(0,9)  with  probability  10%.  The  one-step  Newton- 
Raphson  estimates  were  then  bootstrapped  L  *  250  times  and  we  compared  the  estimates 
of  the  standard  error  of  the  state  transition  parameter  estimate  as  in  Example  6.1. 
Table  6.2  gives  the  summary  of  30  runs  and  compares  these  with  the  emperlcal  stand¬ 
ard  error  based  on  2000  runs. 

TABLE  6.2  Standard  Error  of  5 


Mean 

Standard  Deviation 

Bias 

.b 

Bootstrap 

4.662  X  10“^ 

1.044  X  lO"^ 

-0.464  X  lO" 

Newton-Raphson 

1.871  X  10  ^ 

0.778  X  10 

Emperlcal 

4.197  X  10 

-0.068  X  10** 

Table  6.2:  Summary  of  the  estimates  of  the  standard  error  of  the  Newton-Raphson 
estimate  of  4  in  the  KF  model  with  contaminated  Gaussian  noise  for  samples  of 
length  n  *  250 

a:  Based  on  L  >  250  replications  d:  Average  bias  relative  to 

corresponding  Newton-Elaphson 

b:  Based  on  30  runs  estimate 


c:  Based  on  2000  runs 


e:  Bias  relative  to  the  true  value 
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In  each  example,  the  advantage  of  the  bootstrap  is  clear.  In  both  examples, 
the  bootstrap  estimate  of  the  standard  error  of  $  tended  to  be  slightly  larger 
than  the  emperical  standard  error,  whereas  the  standard  error  of  $  obtained  via 
the  larger  sample  theory  of  the  Newton-Raphson  was  always  considerably  smaller  than 
the  emperical  value.  Thus,  the  bootstrap  has  the  desired  property  that  the  con¬ 
fidence  and  prediction  regions  obtained  via  the  bootstrap  will  tend  to  be  conservative 
The  bootstrap  is  clearly  a  perfect  complement  to  the  Newton-Raphson  procedure. 
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